Add proactive context window management to prevent AI API errors#708
Add proactive context window management to prevent AI API errors#7082witstudios wants to merge 5 commits intomasterfrom
Conversation
Two-pronged fix for AI_APICallError when conversations exceed model context windows (e.g. OpenRouter 400k token limit): **Prevention (proactive truncation)** - Move modelMessages conversion to after system prompt is built so we have accurate token budgets before calling streamText - Use determineMessagesToInclude() from ai-context-calculator to trim the oldest messages first, reserving 75% of the context window for input and leaving 25% headroom for output tokens and estimator inaccuracies - Log a warning when truncation occurs with before/after message counts - Add OpenRouter provider context windows to getContextWindowSize so common models (DeepSeek, Qwen, Llama, Mistral, Gemini, Claude) get accurate limits instead of the 200k fallback **Better errors (when context still exceeds after truncation)** - Detect context-length errors in the route error handler and return HTTP 413 with error key 'context_length_exceeded' instead of a generic 500 - Add isContextLengthError() helper to error-messages.ts that matches provider-specific phrasing (OpenRouter, Anthropic, OpenAI, etc.) - getAIErrorMessage() now returns a clear, actionable message for context errors: "The conversation is too long for this model's context window. Older messages have been trimmed to fit — try sending your message again." - Replace hardcoded error-message logic in SidebarChatTab with getAIErrorMessage() so all three chat surfaces handle errors consistently https://claude.ai/code/session_011dBcfJNRsawEMzZLb67Z7m
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
📝 WalkthroughWalkthroughAdds proactive context-length management to the AI chat route: computes model context window and token estimates, truncates messages to fit a 75% input budget, logs truncation metadata, and returns 413 context_length_exceeded for context-limit errors. Also extends provider-specific context-window mappings and centralizes UI error messaging. Changes
Sequence DiagramsequenceDiagram
participant Client
participant ChatAPI as "AI Chat API Route"
participant Calculator as "Context Calculator"
participant Estimator as "Token Estimator"
participant AIModel as "AI Model"
participant ErrorHandler as "Error Handler"
Client->>ChatAPI: POST /api/ai/chat (messages, systemPrompt, model, provider)
ChatAPI->>Calculator: getContextWindowSize(provider, model)
Calculator-->>ChatAPI: contextWindow
ChatAPI->>Estimator: estimateSystemPromptTokens(fullSystemPrompt)
Estimator-->>ChatAPI: systemPromptTokens
ChatAPI->>Estimator: estimateToolDefinitionTokens(tools)
Estimator-->>ChatAPI: toolTokens
ChatAPI->>ChatAPI: compute inputBudget (75% of contextWindow)
ChatAPI->>Estimator: determineMessagesToInclude(messages, inputBudget)
Estimator-->>ChatAPI: includedMessages (truncated if needed)
alt Truncation occurred
ChatAPI->>ChatAPI: log truncation metadata (counts, tokens, budgets)
end
ChatAPI->>AIModel: stream with modelMessages (includedMessages + system/tool defs)
alt Context length error from model
AIModel-->>ErrorHandler: 413 context-length error
ErrorHandler-->>Client: 413 response { key: "context_length_exceeded", message }
else Successful stream
AIModel-->>Client: streamed response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
packages/lib/src/monitoring/ai-context-calculator.ts (1)
220-243: OpenRouter GPT matching is narrower than the direct-provider GPT block.The OpenRouter section only matches
gpt-4oandgpt-4-turbo(line 228), while the direct OpenAI block (lines 141-165) also covers GPT-5.x variants. If a user routes a GPT-5 model through OpenRouter, it will fall to the 200k default instead of the actual 272k–400k window. Not a bug per se — 200k is a safe underestimate — but worth noting for future model additions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/lib/src/monitoring/ai-context-calculator.ts` around lines 220 - 243, Update the OpenRouter model-matching branch in ai-context-calculator.ts to include GPT-5 variants so routed GPT-5 models get the same token window as the direct-provider GPT logic: add checks against modelLower for strings like "gpt-5", "gpt-5o", "gpt-5-turbo" (and any project-specific GPT-5 naming you supported in the OpenAI block) and return the same token limit used for GPT-5 in the direct OpenAI branch (use the same numeric constant there, e.g., 272_000 or the exact constant used in the OpenAI block) instead of falling through to the 200_000 default; modify the if chain near the other GPT checks (the block referencing providerLower === 'openrouter' and modelLower) to include these new condition(s).apps/web/src/lib/ai/shared/error-messages.ts (1)
47-62:isRateLimitErroroverlaps with context-length phrases but is currently unused—consider adding defensive logic for future safety.While
isRateLimitErrorchecks for the"limit"substring (line 71), which overlaps with"token limit"fromisContextLengthError, this function is not currently called anywhere in the codebase. The code is safe today becausegetAIErrorMessagedoesn't useisRateLimitErrorand checks context length first. However, ifisRateLimitErroris ever used independently in the future, it would misclassify context-length errors. Adding a guard to exclude context-length errors would be defensive refactoring.♻️ Suggested improvement
export function isRateLimitError(errorMessage: string | undefined): boolean { if (!errorMessage) return false; + // Exclude context-length errors that also contain "limit" + if (isContextLengthError(errorMessage)) return false; return ( errorMessage.toLowerCase().includes('rate') || errorMessage.toLowerCase().includes('limit') ||🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/src/lib/ai/shared/error-messages.ts` around lines 47 - 62, isRateLimitError can misclassify context-length issues because it looks for generic "limit" substrings; update isRateLimitError to early-return false when isContextLengthError(errorMessage) is true (or otherwise explicitly exclude context-length phrases) so rate-limit detection never fires for context/window/token-limit errors — locate the isRateLimitError and isContextLengthError functions and add the defensive guard in isRateLimitError (or add an explicit negative-condition for tokens/context/window phrases) to ensure the two checks don't overlap.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/web/src/app/api/ai/chat/route.ts`:
- Around line 829-835: The token estimator (estimateTokens in
ai-context-calculator.ts) uses a fixed 4-chars-per-token heuristic which
undercounts CJK text; update estimateTokens to detect non-ASCII/CJK content
(e.g., check for Unicode ranges or presence of multibyte chars) and use a more
conservative ratio (e.g., 2 chars/token) or make the ratio configurable, then
recalculate calls that rely on it (estimateSystemPromptTokens,
estimateToolDefinitionTokens, and any callers in route.ts) so truncation
reserves enough budget for CJK; keep the existing fallback error handler but
prefer preventing rejections by selecting the conservative ratio when
non-ASCII/CJK is detected.
- Around line 1241-1250: The code currently returns raw provider error text via
errorMsg when isContextLengthError(...) triggers; replace that leak by returning
a sanitized constant or summarized detail (e.g., details: 'context length
exceeded') instead of the raw errorMsg, or call a small sanitizer function
(e.g., sanitizeProviderMessage(errorMsg)) before including it; update the
NextResponse.json call in the isContextLengthError branch to use the
sanitized/constant detail and ensure the user-facing message string "Older
messages have been trimmed to fit — try sending your message again" (from the
proactive truncation logic around the proactive truncation block) is not used if
truncation already ran and likely won’t help—either remove that retry suggestion
or make it conditional based on whether truncation actually occurred.
---
Nitpick comments:
In `@apps/web/src/lib/ai/shared/error-messages.ts`:
- Around line 47-62: isRateLimitError can misclassify context-length issues
because it looks for generic "limit" substrings; update isRateLimitError to
early-return false when isContextLengthError(errorMessage) is true (or otherwise
explicitly exclude context-length phrases) so rate-limit detection never fires
for context/window/token-limit errors — locate the isRateLimitError and
isContextLengthError functions and add the defensive guard in isRateLimitError
(or add an explicit negative-condition for tokens/context/window phrases) to
ensure the two checks don't overlap.
In `@packages/lib/src/monitoring/ai-context-calculator.ts`:
- Around line 220-243: Update the OpenRouter model-matching branch in
ai-context-calculator.ts to include GPT-5 variants so routed GPT-5 models get
the same token window as the direct-provider GPT logic: add checks against
modelLower for strings like "gpt-5", "gpt-5o", "gpt-5-turbo" (and any
project-specific GPT-5 naming you supported in the OpenAI block) and return the
same token limit used for GPT-5 in the direct OpenAI branch (use the same
numeric constant there, e.g., 272_000 or the exact constant used in the OpenAI
block) instead of falling through to the 200_000 default; modify the if chain
near the other GPT checks (the block referencing providerLower === 'openrouter'
and modelLower) to include these new condition(s).
- Fix TypeScript build error: cast includedMessages to UIMessage[] for convertToModelMessages compatibility - Improve CJK token estimation: use 2 chars/token when >20% non-ASCII content detected (vs 4 chars/token for Latin text) - Sanitize 413 error response: replace raw provider error with constant, make retry suggestion conditional on whether truncation already ran - Add isContextLengthError guard to isRateLimitError to prevent misclassification of context-length errors as rate-limit errors - Add GPT-5 variant matching to OpenRouter section of getContextWindowSize Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addressed all CodeRabbit review feedback in 8a5e5d5Actionable comments (fixed):
Nitpick comments (fixed):
Build fix:
|
The wasTruncated variable was declared inside the try block but referenced in the outer catch block for conditional error messaging. Hoist it to function-level scope so the 413 error response can correctly determine whether truncation already occurred. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
packages/lib/src/monitoring/ai-context-calculator.ts (1)
55-66: Consider consolidating token estimation logic into one shared implementation.
estimateTokenshere now differs frompackages/lib/src/monitoring/ai-monitoring.ts:404-407(still fixed 4 chars/token). Keeping two heuristics will drift truncation vs. monitoring metrics over time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/lib/src/monitoring/ai-context-calculator.ts` around lines 55 - 66, The repository now has two different token-estimation heuristics (estimateTokens in ai-context-calculator.ts and the fixed 4-chars/token logic in ai-monitoring.ts) which can cause drift; replace the duplicated logic by extracting a single shared utility (e.g., export a consolidateEstimateTokens function) and import it from both places so both monitoring and context truncation use the same algorithm (include the non-ASCII detection and charsPerToken decision currently in estimateTokens). Update references to call the shared function (estimateTokens / consolidateEstimateTokens) and remove the old inline heuristic in ai-monitoring.ts to ensure consistent token counting across the codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/web/src/app/api/ai/chat/route.ts`:
- Around line 1249-1252: The error message for the non-truncated branch
incorrectly implies older messages were trimmed; update the logic that builds
the error response (the object using wasTruncated, message, and details in
route.ts) so when wasTruncated is false the message does not say messages were
trimmed—instead return a clear statement like "The conversation is too long for
this model's context window. Please start a new conversation or shorten the
conversation" while preserving details: 'context_length_exceeded'.
- Around line 837-844: After calling determineMessagesToInclude, add a guard
that ensures the current/latest user turn was not dropped: find the last user
message in sanitizedMessages (e.g., lastUserIndex or by role === 'user') and
verify includedMessages is non-empty and contains that message (compare id or
index); if it doesn't, return a clear client error (400) or an explicit error
response indicating the user input exceeds the token budget instead of
proceeding. Update the logic around determineMessagesToInclude /
includedMessages / wasTruncated in route.ts to perform this check and
short-circuit with a helpful error when the latest user turn would be removed.
In `@packages/lib/src/monitoring/ai-context-calculator.ts`:
- Around line 225-255: The OpenRouter-specific model limit block is being
bypassed because earlier generic model-name branches match first; update those
earlier checks to exclude OpenRouter by adding a guard like providerLower !==
'openrouter' to the generic model-name conditionals (the branches that inspect
modelLower for 'gpt', 'claude', 'gemini', etc.) or alternatively move the
OpenRouter block earlier so it runs before the generic branches; locate the
providerLower and modelLower checks in ai-context-calculator.ts and adjust the
conditionals around the generic model checks or reposition the OpenRouter if
(providerLower === 'openrouter') block so OpenRouter-specific mappings are
applied correctly.
- Line 60: The regex /[^\x00-\x7F]/g used to compute nonAsciiCount violates the
linter; replace that regex-based approach with explicit code-point checking:
iterate over the string (e.g., using for...of or Array.from) and count
characters whose codePointAt(0) > 0x7F, then store that count in nonAsciiCount.
Update the expression that computes nonAsciiCount accordingly so it handles full
Unicode code points (surrogate pairs) and removes the problematic regex.
---
Nitpick comments:
In `@packages/lib/src/monitoring/ai-context-calculator.ts`:
- Around line 55-66: The repository now has two different token-estimation
heuristics (estimateTokens in ai-context-calculator.ts and the fixed
4-chars/token logic in ai-monitoring.ts) which can cause drift; replace the
duplicated logic by extracting a single shared utility (e.g., export a
consolidateEstimateTokens function) and import it from both places so both
monitoring and context truncation use the same algorithm (include the non-ASCII
detection and charsPerToken decision currently in estimateTokens). Update
references to call the shared function (estimateTokens /
consolidateEstimateTokens) and remove the old inline heuristic in
ai-monitoring.ts to ensure consistent token counting across the codebase.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
apps/web/src/app/api/ai/chat/route.tsapps/web/src/lib/ai/shared/error-messages.tspackages/lib/src/monitoring/ai-context-calculator.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/web/src/lib/ai/shared/error-messages.ts
| return Math.ceil(text.length / 4); | ||
| // Detect non-ASCII heavy content (CJK, emoji, etc.) | ||
| // CJK Unified Ideographs, Hiragana, Katakana, Hangul, etc. | ||
| const nonAsciiCount = (text.match(/[^\x00-\x7F]/g) || []).length; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and read the content around line 60
fd "ai-context-calculator.ts" --type fRepository: 2witstudios/PageSpace
Length of output: 117
🏁 Script executed:
# Once we confirm the file exists, read it to verify the exact content at line 60
cat -n packages/lib/src/monitoring/ai-context-calculator.ts | sed -n '55,65p'Repository: 2witstudios/PageSpace
Length of output: 636
🏁 Script executed:
# Check for linter configuration in the packages/lib directory
find packages/lib -maxdepth 3 -name "*.json" -o -name "*.mjs" -o -name "biome.json" -o -name ".eslintrc*" | head -20Repository: 2witstudios/PageSpace
Length of output: 150
🏁 Script executed:
# Check if Biome is configured at the root or in packages
find . -maxdepth 2 -name "biome.json" -o -name "biome.jsonc"Repository: 2witstudios/PageSpace
Length of output: 77
🏁 Script executed:
# Read the biome.json configuration
cat biome.jsonRepository: 2witstudios/PageSpace
Length of output: 1839
🏁 Script executed:
# Check the suggested fix in the review comment - verify if the logic is equivalent
python3 << 'EOF'
# Test if the suggested fix produces the same results as the original
# Original: text.match(/[^\x00-\x7F]/g) matches all non-ASCII characters
# Suggested: reduce with codePointAt check for > 0x7f
test_strings = [
"hello", # All ASCII
"hello🌍", # ASCII + emoji
"你好", # Chinese characters
"café", # ASCII + accented char
"مرحبا", # Arabic
]
for test_str in test_strings:
# Simulate original regex behavior
original_count = len([c for c in test_str if ord(c) > 0x7f])
# Simulate suggested fix behavior
suggested_count = 0
for char in test_str:
code_point = ord(char)
if code_point > 0x7f:
suggested_count += 1
print(f"String: '{test_str}' | Original: {original_count} | Suggested: {suggested_count} | Match: {original_count == suggested_count}")
EOFRepository: 2witstudios/PageSpace
Length of output: 357
🏁 Script executed:
# Check if noControlCharactersInRegex is part of Biome's recommended rules
# and verify if it would be active with the current config
web_search "Biome noControlCharactersInRegex rule"Repository: 2witstudios/PageSpace
Length of output: 115
🏁 Script executed:
# Search for other similar patterns that might have the same issue
rg -n '\\x00|\\x[0-9A-Fa-f]{2}' --type=ts --type=tsx packages/Repository: 2witstudios/PageSpace
Length of output: 92
🏁 Script executed:
# Verify the suggested fix doesn't have any edge cases with surrogate pairs or multi-byte chars
python3 << 'EOF'
# Test edge cases with the suggested fix
test_cases = [
("hello", 0), # All ASCII
("café", 1), # Single accented character
("🌍", 1), # Single emoji (surrogate pair in JS)
("你好世界", 4), # Chinese characters
("hello🌍world", 1), # Mixed ASCII and emoji
("\x00\x01\x7f", 0), # Control characters at edge of range
("", 0), # Empty string
]
for test_str, expected in test_cases:
# Suggested fix logic
count = 0
for char in test_str:
code_point = ord(char)
if code_point > 0x7f:
count += 1
status = "✓" if count == expected else "✗"
print(f"{status} String: {repr(test_str)} | Expected: {expected} | Got: {count}")
EOFRepository: 2witstudios/PageSpace
Length of output: 355
🏁 Script executed:
# Search for control character patterns in the codebase using ts file type only
rg -n '\\x00|\\x[0-9A-Fa-f]{2}' --type=ts packages/Repository: 2witstudios/PageSpace
Length of output: 2413
🏁 Script executed:
# Verify Biome rule documentation is available and check if recommended includes the suspicious rule
python3 << 'EOF'
# Check Biome's default rule settings for noControlCharactersInRegex
# Based on the biome.json, the "suspicious" category doesn't explicitly disable the rule,
# which means it uses the recommended default
biome_config = {
"suspicious": {
"noArrayIndexKey": "off",
"noExplicitAny": "off",
"noImplicitAnyLet": "off",
"noDoubleEquals": "off",
"noGlobalIsNan": "off",
"noAssignInExpressions": "off"
# noControlCharactersInRegex is NOT listed, so it uses the default (recommended = true for suspicious)
}
}
print("Analysis of biome.json configuration:")
print("- Linter is enabled: true")
print("- 'recommended' is enabled globally for linter rules")
print("- 'suspicious' category does NOT explicitly disable 'noControlCharactersInRegex'")
print("- Therefore, the rule IS active by default")
print("\nThe regex pattern /[^\\x00-\\x7F]/ contains control character escapes,")
print("which would trigger the 'noControlCharactersInRegex' rule in Biome.")
EOFRepository: 2witstudios/PageSpace
Length of output: 443
Replace the regex pattern to resolve the noControlCharactersInRegex Biome linter violation.
The pattern /[^\x00-\x7F]/g at line 60 triggers Biome's suspicious/noControlCharactersInRegex rule, which is enabled by default. Use explicit code point checking instead:
Suggested fix
- const nonAsciiCount = (text.match(/[^\x00-\x7F]/g) || []).length;
+ const nonAsciiCount = [...text].reduce((count, char) => {
+ const codePoint = char.codePointAt(0) ?? 0;
+ return count + (codePoint > 0x7f ? 1 : 0);
+ }, 0);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const nonAsciiCount = (text.match(/[^\x00-\x7F]/g) || []).length; | |
| const nonAsciiCount = [...text].reduce((count, char) => { | |
| const codePoint = char.codePointAt(0) ?? 0; | |
| return count + (codePoint > 0x7f ? 1 : 0); | |
| }, 0); |
🧰 Tools
🪛 Biome (2.4.4)
[error] 60-60: Unexpected control character in a regular expression.
(lint/suspicious/noControlCharactersInRegex)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/lib/src/monitoring/ai-context-calculator.ts` at line 60, The regex
/[^\x00-\x7F]/g used to compute nonAsciiCount violates the linter; replace that
regex-based approach with explicit code-point checking: iterate over the string
(e.g., using for...of or Array.from) and count characters whose codePointAt(0) >
0x7F, then store that count in nonAsciiCount. Update the expression that
computes nonAsciiCount accordingly so it handles full Unicode code points
(surrogate pairs) and removes the problematic regex.
There was a problem hiding this comment.
Investigated — the regex /[^\x00-\x7F]/g does not violate any configured lint rule. CI ESLint passed green on this code. The no-control-regex rule (which is likely what was flagged) targets control characters in \x00-\x1F, not the full ASCII range check we're using here.
The regex approach is also slightly more conservative for surrogate pairs (emoji), since each surrogate half counts separately, inflating nonAsciiCount — which makes the token estimate more conservative (safer for truncation). No change needed.
There was a problem hiding this comment.
@2witstudios Thank you for investigating this thoroughly! You're absolutely right—I apologize for the false positive. The regex /[^\x00-\x7F]/g is valid and doesn't violate your configured lint rules. The no-control-regex rule (ESLint) indeed targets actual control characters in the \x00-\x1F range, not ASCII boundary checks like this one.
Your point about the conservative behavior with surrogate pairs is excellent—counting each surrogate half separately does inflate nonAsciiCount, which makes the token estimate err on the side of caution for truncation purposes. That's exactly the right approach here.
I've noted this for future reference. Thanks for the clarification!
✏️ Learnings added
Learnt from: 2witstudios
Repo: 2witstudios/PageSpace PR: 708
File: packages/lib/src/monitoring/ai-context-calculator.ts:60-60
Timestamp: 2026-02-27T15:18:17.845Z
Learning: In packages/lib/src/monitoring/ai-context-calculator.ts, the regex `/[^\x00-\x7F]/g` used in estimateTokens() to count non-ASCII characters is valid and does not violate lint rules (CI ESLint passes). The `no-control-regex` rule targets control characters in `\x00-\x1F`, not full ASCII range checks. The regex approach is intentionally conservative for surrogate pairs (emoji), counting each surrogate half separately to inflate nonAsciiCount, making token estimates safer for context truncation.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: 2witstudios
Repo: 2witstudios/PageSpace PR: 258
File: apps/realtime/src/validation.ts:0-0
Timestamp: 2026-01-27T03:45:52.322Z
Learning: Enforce using paralleldrive/cuid2 for ID generation across the TypeScript codebase (not UUIDs). IDs should follow the CUID2 format: lowercase alphanumeric starting with a letter, matching ^[a-z][a-z0-9]{1,31}$ with a maximum length of 32 characters. Audit code paths that generate IDs (e.g., new UUID usages) and replace with cuid2 equivalents; ensure generated IDs are consistently lowercase and validated against the regex, and document any exceptions where IDs may differ in semantic meaning.
Learnt from: 2witstudios
Repo: 2witstudios/PageSpace PR: 699
File: apps/marketing/src/app/docs/self-hosting/environment/page.tsx:34-37
Timestamp: 2026-02-18T05:15:03.695Z
Learning: Ensure that cross-subdomain cookie handling uses two environment variables: COOKIE_DOMAIN (server-side, for Set-Cookie headers in server code like apps/web/src/lib/auth/cookie-config.ts) and NEXT_PUBLIC_COOKIE_DOMAIN (client-side, for document.cookie interactions in theme-cookie.ts in both apps/web and apps/marketing). This pattern should be verified across all related files that set or rely on cookie domains to maintain consistent domain scoping and enable cross-subdomain functionality.
Reviews proactive context window management for AI chat, covering token estimation accuracy, OpenRouter model registry, error handling, type safety, and OWASP compliance. Recommends adding tests and verifying error propagation before merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix wasTruncated message inversion: non-truncated error no longer
claims messages were trimmed
- Guard empty messages: return 413 when truncation leaves zero messages
instead of sending empty context to streamText
- Fix estimateMessageTokens: match actual Vercel AI SDK tool-{name}
part format (input/output/state) instead of legacy tool-call/tool-result
- Reconcile context windows: extract MODEL_CONTEXT_WINDOWS into shared
model-context-windows.ts as single source of truth (fixes glm-5
200000→202752, minimax m2.5 via OpenRouter 1M→204800 divergence)
- Fix isContextLengthError: replace broad '413' string match with
specific HTTP status pattern to prevent false positives
- Fix estimateTokens: replace regex array allocation with efficient
charCode loop for CJK detection
- Add 42 unit tests for isContextLengthError, estimateTokens,
estimateMessageTokens, getContextWindowSize, determineMessagesToInclude
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix inverted wasTruncated message logic in catch block 413 response
- Guard against empty includedMessages after truncation (return 413 early)
- Handle tool-{toolName} part format in estimateMessageTokens (PageSpace DB format)
- Extract MODEL_CONTEXT_WINDOWS to shared model-context-windows.ts (single source of truth)
- Narrow isContextLengthError '413' check to require context/token keywords
- Replace regex with for-loop counter in estimateTokens for memory efficiency
- Add unit tests for isContextLengthError and CJK-aware estimateTokens
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Guard against empty truncation: return 413 error when latest message alone exceeds context budget instead of calling API with no messages - Fix misleading error message: when wasTruncated is false, no longer claims messages 'have been trimmed' - Fix OpenRouter model limit shadowing: move OpenRouter block before provider-specific branches so models like 'claude'/'gpt'/'gemini' routed via OpenRouter get their correct limits Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Exclude 'request body too large' / 'payload too large' from isContextLengthError() so payload 413s aren't misclassified - Add dedicated request-size error category in getAIErrorMessage() - Pass through server-provided context error messages (e.g. "latest message is too large", "even after trimming") instead of replacing them with a generic client-side message - Update fallback context-length message to suggest new conversation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
This PR implements proactive context window management for AI chat conversations to prevent token limit errors from AI providers. Instead of letting conversations fail when they exceed a model's context window, the system now intelligently truncates older messages before sending requests to the API.
Key Changes
Context-aware message truncation: Added logic to
POST /api/ai/chatthat calculates available token budget based on the model's context window, system prompt size, and tool definitions, then includes only the most recent messages that fit within that budget.Enhanced context window detection: Expanded
getContextWindowSize()inai-context-calculator.tswith model-specific context limits for OpenRouter providers (Claude, Gemini, GPT, DeepSeek, Qwen, Llama, Mistral, Grok models), replacing the previous generic 200k fallback.Context length error handling:
isContextLengthError()utility function to detect context/token limit errors from various AI providerscontext_length_exceedederror code when detectedgetAIErrorMessage()to provide user-friendly messaging for context length errorsImproved error messaging in UI: Refactored
SidebarChatTab.tsxto use centralizedgetAIErrorMessage()function instead of inline error detection logic, ensuring consistent error messaging across the application.Implementation Details
Review Feedback Addressed
getContextWindowSize()so model names likeclaude/gpt/geminirouted via OpenRouter get their correct limitsisContextLengthError()now excludes payload-too-large errors; dedicated error category added ingetAIErrorMessage()isRateLimitErrordefensive guard: Excludes context-length errors that contain "limit" substringdetailsfield returns constant'context_length_exceeded'instead of raw provider messagesHow to Validate
Summary by CodeRabbit
New Features
Bug Fixes
Refactor